info
Returning a "page" of results is split into 2 phases because:
- Find all matching documents (
QUERY) - Combine into single sorted list (Because documents are on different shards) (
FETCH)
Query Phase
- Client sends search request to
coordinating_node(node that receives the request)- Node creates empty PQ of size
from+size
- Node creates empty PQ of size
- Coordinating node forwards requets to shard copy of every shard
- Each shard executes query locally
- Builds local priority queue of size
from+size - Returns result + sort values
coordinating_nodemerges values to produce globally sorted list
Fetch Phase
- Coordinating node identifies which documents need to be fetched and issues a
MGETrqeuest - Each shard loads and enriches documents => returns to
coordinating_node coordinating_nodereturns result to client
warning
Limits of pagination because each shard must build a PQ of from + size
Sorting can get very expensive on very large results ( > 5000 pages)
Can disable sorting with the scan search type
Search Options
Parameters that influence the search process
| Option | Description |
|---|---|
preference | Control which shards / nodes to handle search request (avoid bouncing results problem) |
timeout | Tells coordinating_node how long to wait before returning resutls (returns SOME results) |
routing | Specified to limit which shard to search |
search_type |
|
scan and scroll
Used together to retrieve large numbers of documents efficiently without the overhead of deep pagination
scroll
- Allows initial search to keep pulling batches of results from ES
- Similar to cursor
scan
- Disables global sorting of results (main overhead of deep pagination)
Steps
- Execute search request
- Keep scroll open for 1 minute
_scroll_idreturned (hits not included)
GET /old_index/_search?search_type=scan&scroll=1m
{
"query": {
"match_all": {}
},
"size: 1000
}
- Retrieve first batch of results
_scroll_idpassed in body / query param- Keep the scroll open for another minute
- A new
_scroll_idis returned
GET /_search/scroll?scroll=1m c2Nhbjs1OzExODpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExOTpRNV9aY1VyUVM4U0 NMd2pjWlJ3YWlBOzExNjpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzExNzpRNV9aY1Vy UVM4U0NMd2pjWlJ3YWlBOzEyMDpRNV9aY1VyUVM4U0NMd2pjWlJ3YWlBOzE7dG90YW xfaGl0czoxOw==
note
- Number of results is
size*number_of_primary_shards - No more hits => all documents processed